Position heaps: A simple and dynamic text indexing data structure
نویسندگان
چکیده
We address the problem of finding the locations of all instances of a string P in a text T , where preprocessing of T is allowed in order to facilitate the queries. Previous data structures for this problem include the suffix tree, the suffix array, and the compact DAWG. We modify a data structure called a sequence tree, which was proposed by Coffman and Eve for hashing [1], and adapt it to the new problem. We can then produce a list of k occurrences of any string P in T in O(||P ||+k) time. Because of properties shared by suffixes of a text that are not shared by arbitrary hash keys, we can build the structure in O(||T ||) time, which is much faster than Coffman and Eve’s algorithm. These bounds are as good as those for the suffix tree, suffix array, and the compact DAWG. The advantages are the elementary nature of some of the algorithms for constructing and using the data structure and the asymptotic bounds we can give for updating the data structure when the text is edited.
منابع مشابه
The Position Heap of a Trie
The position heap is a text indexing structure for a single text string, recently proposed by Ehrenfeucht et al. [Position heaps: A simple and dynamic text indexing data structure, Journal of Discrete Algorithms, 9(1):100-121, 2011]. In this paper we introduce the position heap for a set of strings, and propose an efficient algorithm to construct the position heap for a set of strings which is ...
متن کاملPosition Heaps for Parameterized Strings
We propose a new indexing structure for parameterized strings, called parameterized position heap. Parameterized position heap is applicable for parameterized pattern matching problem, where the pattern matches a substring of the text if there exists a bijective mapping from the symbols of the pattern to the symbols of the substring. We propose an online construction algorithm of parameterized ...
متن کاملHeaps Simplified
The heap is a basic data structure used in a wide variety of applications, including shortest path and minimum spanning tree algorithms. In this paper we explore the design space of comparison-based, amortized-efficient heap implementations. From a consideration of dynamic single-elimination tournaments, we obtain the binomial queue, a classical heap implementation, in a simple and natural way....
متن کاملVerifying Heaps' law using Google Books Ngram data
This article is devoted to the verification of the empirical Heaps law in European languages using Google Books Ngram corpus data. The connection between word distribution frequency and expected dependence of individual word number on text size is analysed in terms of a simple probability model of text generation. It is shown that the Heaps exponent varies significantly within characteristic ti...
متن کاملAn Indexing Algorithm for Text Retrieval
The rapid growth of world-wide information systems results in new requirements for text indexing and retrieval. In this paper we propose an algorithm for query evaluation in text retrieval systems based on well-known inverted lists augmented with additional data structure and estimate expected performance gains. In addition to improved performance, this data structure is able to support dynamic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Discrete Algorithms
دوره 9 شماره
صفحات -
تاریخ انتشار 2011